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Approximate Bayesian computation (ABC) is a popular tech- 
nique for approximating likelihoods and is often used in parameter 
estimation when the likelihood functions are analytically intractable. 
Although the use of ABC is widespread in many fields, there has been 
little investigation of the theoretical properties of the resulting esti- 
mators. In this paper we give a theoretical analysis of the asymptotic 
properties of ABC based maximum likelihood parameter estimation 
for hidden Markov models. In particular, we derive results analogous 
to those of consistency and asymptotic normality for standard max- 
imum likelihood estimation. We also discuss how Sequential Monte 
Carlo methods provide a natural method for implementing likelihood 
based ABC procedures. 



1. Introduction. The hidden Markov model (HMM) is an important 
statistical model in many fields including Bioinformatics (e.g. Durbin et al. 
(1998)), Econometrics (e.g. Kim, Shephard and Chib (1998)) and Popula- 
tion genetics (e.g. Felsenstein and Churchill (1996)); see also Cappe, Ryden 
and Moulines (2005) for a recent overview. Often one has a range of HMMs 
parameterised by a parameter vector 9 taking values in some compact sub- 
set G of Euclidian space. Given a sequence of observations Yi, . . . ,y„ the 
objective is to find the parameter vector 9* € Q that corresponds to the 
particular HMM from which the data were generated. 

A common approach to estimating 9* is maximum likelihood estimation 
(MLE). The parameter estimate, denoted is obtained via maximizing 
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the log- likelihood of the observations: 



9n = argmaxgge/„(6') 



where 



n 



ln{0) := \ogpe ( 



Yl,...,Yr. 



n 



,) =^iogp9(yi|yi,...,y,„i) 



Unless the model is simple, e.g. linear Gaussian or when X \s a. finite set, 
one can seldom evaluate the likelihood analytically. There are a variety of 
techniques, for example sequential Monte Carlo (SMC), for numerically esti- 
mating the likelihood. However, in a wide range of applications these meth- 
ods cannot be used, for example when the conditional density of the ob- 
served state of the HMM given the hidden state is intractable, by which we 
mean that this density cannot be evaluated analytically and has no unbiased 
Monte Carlo estimator. Despite this, one is often still able to generate sam- 
ples from the corresponding processes for different values of the parameter 
9 (e.g. Jasra et al. (2010)). This has led to the development of methods in 
which 6* is estimated by taking the value of 9 which maximizes some prin- 
cipled approximation of the likelihood which is itself estimated using Monte 
Carlo simulation. 

One such approach is the convolution particle filter of Campillo and Rossi 
(2009). Another technique which can be applied to this class of problems is 
indirect inference; see Gourieroux, Monfort and Renault (1993) and Hegg- 
land and Frigessi (2004). However in the context of HMMs, when one does 
not adopt a linear Gaussian approximation of the filtering density (which 
can be very inaccurate, as in extended Kalman filter approximations), this 
method is likely to be very expensive. A third method which has recently 
received a great deal of attention is approximate Bayesian computation 
(ABC). A non-exhaustive list of references includes: McKinley, Cook and 
Deardon (2009); Peters, Wiithrich and Shevchenko (2010); Pritchard et al. 
(1999); Ratmann et al. (2009); Tavre et al. (1997). See also Sisson and Fan 
(to be published) for a review on computational methodology. 

In the standard ABC approach (omitting for the moment the possible use 
of summary statistics) one assumes that a data set Yi, . . . ,y„ is given and 



approximates the likelihood function pe [Yi , . . . , Y^j via probabilities of the 
form 



where {Yfc}fc>i denotes the observed state of the HMM, d[-; •) is some suit- 
able metric on the n-fold product space M™ x • • • x M™ and e > is a 




(1) 
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constant which reflects the accuracy of the approximation. In practice these 
probabihties are themselves estimated using Monte Carlo techniques. 

The intuitive justification for the ABC approximation is that for suffi- 
ciently small e 



PeiYi,...,Y„ 



d(Yi,...,Yn;Yi,...,Yn) <e 

Yu...,Yn 

where ~ denotes the volume of the d-ball of radius e around the points 

Yi,...^Yn. Thus the probabilities (1) will provide a good approximation 
to the likelihood, up to the value of some renormalising factor which is 
independent of 6 and hence can be ignored. However in general it is not at 
all clear in what sense an approximation to the likelihood must be 'good' in 
order for the resulting inference procedures to be well behaved. The purpose 
of this paper is to resolve this issue by directly investigating the effect of 
the parameter e, not on the quality of the approximations (1), but on the 
behaviour of the resulting ABC based parameter estimators. 

We note that in (1) we have implicitly assumed that one is working with 
the entire data set rather than a summary statistic of it as is usually done 
in practice, especially when the observations {^fc}fc>o take values in some 
high dimensional space. For ease of exposition we shall persist with this 
assumption throughout the rest of the paper, noting where appropriate the 
conditions under which the results we derive will continue to hold when 
summary statistics are used (see in particular the remarks at the ends of 
Sections 3 and 4). 

1.1. Contribution and Structure. In this paper we investigate the be- 
haviour of ABC when used to estimate the parameters of HMMs for which 
the conditional densities of the observations given the hidden state are in- 
tractable. We shall use a specialization, first proposed in Jasra et al. (2010), 
of the standard ABC likelihood approximation (1) for when the observations 
are generated by a HMM. Specifically we approximate the likelihood of a 
given sequence of observations Yi , . . . , 1^ from a HMM with the probability 

(2) Pe(yiei?|,^,...,y„GS|.^_ 

where By denotes the ball of radius e centered around the point y. The 
benefit of this approach is that it retains the Markovian structure of the 
model. This facilitates both simpler Markov chain Monte Carlo (MCMC) 
(e.g. McKinley, Cook and Deardon (2009)) and sequential Monte Carlo 
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(SMC) (e.g. Jasra et al. (2010)) implementation of the ABC approximation. 
Furthermore our experience suggests that this approximation is competitive, 
from an accuracy perspective, with a wide range of competing methods; see 
the two afore mentioned references for a deeper discussion of this point. 

One could use the approximate likelihoods (2) to estimate the parameters 
of a HMM in one of two ways. Firstly one could take a Bayesian approach 
and use (2) to construct an approximation to the posterior. This is the ap- 
proach most commonly taken in the literature. Alternatively, as we shall do 
in this paper, one could take a frequentist approach and estimate the param- 
eters of the HMM with the value of the parameter vector which maximizes 
the corresponding approximate likelihood (2) of the observations. We shall 
henceforth term this procedure approximate Bayesian computation maxi- 
mum likelihood estimation (ABC MLE). 

Although the use of ABC has become commonplace there has to date 
been little investigation of the theoretical properties of its use in parameter 
estimation in either the Bayesian or frequentist context. In particular the 
following questions remain to be answered. Is ABC MLE consistent? Do 
ABC based posterior distributions concentrate around the true value of the 
parameter vector? Indeed do ABC based estimators converge to anything at 
all? Although these questions may seem abstract it is well known that even 
the mighty MLE can fail to converge in practice, see Ferguson (1982). Thus 
before ABC can be placed on firm mathematical foundations the questions 
raised above need to be addressed. 

The purpose of this paper is to bridge this theoretical gap in the context of 
maximum likelihood estimation. In particular we develop a theoretical jus- 
tification of the ABC MLE procedure based on its large sample properties 
analogous to that provided for MLE by standard results concerning asymp- 
totic consistency and normality. Our approach to this problem is based on 
the novel observation that ABC MLE can be considered as performing MLE 
using the likelihoods of a collection of perturbed HMMs. This implies that 
the ABC MLE should in some sense inherit its behaviour from the standard 
MLE. Using this observation we first show that unlike the MLE, which is 
asymptotically consistent, the ABC MLE estimator has an innate asymp- 
totic bias. Secondly we show that this bias can be made arbitrarily small 
by choosing sufficiently small values of e. Together these results show that 
asymptotically the ABC MLE will converge to the true parameter value with 
a margin of error which can be made arbitrarily small by taking a suitable 
choice of e. Thus our results allow us to develop a rigorous formulation of the 
intuitive justification of ABC and in doing so to provide a firm mathematical 
basis for performing ABC based inference. 
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We complete the picture by analysing the so called noisy variant (see 
e.g. Fearnhead and Prangle (2010)) of ABC MLE. We show that unlike 
the ABC MLE the noisy ABC MLE is always asymptotically consistent. 
This raises the question: does noisy ABC provide us with a 'free pass' when 
performing parameter estimation? Unfortunately the answer in general is 
no. We show that under reasonable conditions the Fisher information of the 
noisy ABC MLE is strictly less than that of the standard MLE. As a result 
we show that the noisy ABC suffers from a relative loss of information and 
hence statistical efficiency. 

As part of these investigations we establish a novel asymptotic missing 
information principle for HMMs with observations perturbed by additive 
uniform noise which may in itself be of independent interest to the reader. 
Finally we remark that although this study is theoretical it is our belief that 
the results presented herein will help provide guidance for future method- 
ological developments in the field. 

This paper is structured as follows. In Section 2 the notation and assump- 
tions are given. In Section 3 we establish some approximate asymptotic 
consistency type results for the standard ABC MLE. In Section 4 results 
concerning the asymptotic consistency and normality of the noisy ABC es- 
timator are presented. An extension of the ABC method using probability 
kernels is discussed in Section 5 and an overview of the use of SMC methods 
to provide a practical way of implementing ABC is presented in Section 6. 
An example is given in Section 7 which provides a qualitative demonstration 
of the behaviour of the ABC estimator predicted in Sections 3 and 4. The 
article is summarized in Section 8. Supporting technical lemmas and proofs 
of some of the theoretical results are housed in the two appendices. 

2. Notation and Assumptions. 

2.1. Notation and Main Assumptions. Throughout this paper we shall 
use lower case letters x,y, zto denote dummy variables and upper case letters 
X, Y, Z to denote random variables. Observations of a random variable will 
be denoted by Y. 

We shall frequently have to refer to various kinds of both finite, infinite 
and doubly infinite sequences. For brevity the following shorthand nota- 
tions are used. For any pair of integers k < n, Yk-n denotes the sequence of 
random variables Y^, . . . , Y^, Y^^.j^ denotes the sequence . . . , Y^; Yn-.oo de- 
notes the sequence Yn, ■ ■ ■ and ^^-oo:A;;n:oo denotes the sequence . . . , Y^; Yn, ■ ■ ■■ 
Given a sequence of integers . . . , jo, ji, . . . and indicies r < s we shall 
let jr:s denote jr, jr+i, ■ ■ ■ , js-i, js] j-oo:r denote ...,jr-i,jr and js:oo de- 
note js,js+i,--- respectively. Further we shall also use j-oo:oo to denote 
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the full sequence . . . , jo, ji, The two notations defined above will 

be combined in the following manner. Given a doubly infinite sequence of 
random variables . . . ,Y^i,Yq,Yi, . . ., a doubly infinite sequence of integers 
. . . , j_i, jo, ji, . . . and indicies r < s we shall let Yj^.^ denote the sequence 
^jr,Yjr+i:- ■ ■ > Yj^-i^Yj^- The sequences Yj_^.^, Yj^.^ and Yj_^.^ are defined 
analogously. Lastly given a measure on a Polish space X we let / ■ ^{dx\-_n) 
denote integration w.r.t. the n-fold product measure /i®" on the n-fold prod- 
uct space X"^. Moreover, given a function /(xi, . . . ,Xn) '■ X^ — >■ M and in- 
tegers 1 < A; < Z < n, we shall let f{-)l^idxi:k;i:n) denote the partial 
integrals J.^„ f{-)ii{dx\) ■ ■ ■ ii{dxk)p.{dxi) ■ ■ ■ n{dxn)- 

The essence of our approach is to show that in some sense the ABC 
MLE inherits the properties of the standard MLE. Thus we shall operate 
under assumptions on the HMMs that are sufficient to ensure asymptotic 
consistency and normality of the MLE. 

It is assumed that the Markov chain {Xk}j^^Q is time-homogenous and 
takes values in a compact Polish space X with associated Borel a-field B {X). 
Throughout it will be assumed that we have a collection of HMMs all defined 
on the same state space and parametrised by some vector taking values in 
a compact set G G W^. Furthermore we shall reserve 9* to denote the 'true' 
value of the parameter vector. For each 6' G O we let Qq (x, •) denote the 
transition kernel of the corresponding Markov chain and for each x E: X and 
^ G we assume that Qe (x, •) has a density qg (x, •) w.r.t. some common 
finite dominating measure n on X. The initial distribution of the hidden 
state will be denoted by ttq. 

We also assume that the observations {yfe}fc>o take values in a state 
space y C for some m > 1. Furthermore, for each k we assume that 
the random variable is conditionally independent of X-oo:fc-i;fc+i:oo ^ind 
y_oo:fc-i;fc+i:oo givcu Xk and that the conditional laws have densities ge {y\x) 
w.r.t. some common finite dominating measure f. We further assume that 
for every 9 the joint chain {Xfe, Yfe}j,>Q is positive Harris recurrent and has 
a unique invariant distribution ttq. We shall write ¥g to denote the laws of 
the corresponding stationary processes and Eg to denote expectations with 
respect to the stationary laws IP5). 

Given any e > and y G let By denote the closed ball of radius e 
centered on the point y and let ZY^e denote the uniform distribution on By. 
For any A C M"^, let 1a denote the indicator function of A. Additionally, 
for any square matrix M G M'"^™', we shall let ||-M|| denote the Frobenius 
norm Wf = ^-^^1,. 

For any two probability measures ^1, /U2 on a measurable space (£", <f ) we 
let WfjLi — /Lt2||Ty denote the total variation distance between them. For all 
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p G [1,00) we let Lp{fj,) denote the set of real valued measurable functions 
satisfying / \f{x)\^ ij,{dx) < 00. 

Finally, we note that the asymptotic results that we prove for the ABC 
MLE and its noisy variant hold independently of the initial condition of the 
hidden state process {Xfc}^.>Q. Thus, in order to keep the presentation as 
concise as possible we shall suppress the presence of the initial condition of 
the hidden state except in those instances where it needs to be referred to 
explicitly. 

2.2. Particular Assumptions. In addition to the assumptions above, the 
following assumptions are made at various points in the article. Assump- 
tions (A1)-(A3) below are sufficient to guarantee asymptotic consistency 
of the MLE and (A4)-(A5) ensure the existence of an asymptotic Fisher 
information matrix, denoted I {9*). Further, if the asymptotic Fisher infor- 
mation 1(9*) is invertible then under assumptions (A1)-(A5) the MLE will 
be asymptotically normal, see Douc, Moulines and Ryden (2004) for more 
details. 

(Al) The parameter vector 6* belongs to the interior of and 9 = 9* ii and 
only ifFe{...,Y^i,Yo,Yi,...)=Fe4...,Y_i,Yo,Y,,...). 

(A2) For all y £ y, x,x' £ X, the mappings 9 — )• qe{x, x') and 9 ^ gg{y \ x) 
are continuous w.r.t. 9. 

(A3) There exist constants c^, ci G (0, 00) such that for every y G y, x,x' G 

x,9ee 

Qi < qe{x,x'), ge{y\ x) < ci. 

For the remaining assumptions we assume that there exists an open ball 
G G Q centered at 9* such that 

(A4) For all y £ y, x,x' G X, the mappings 9 — )• qg{x, x') and 9 ^ gg{y \ x) 
are twice continuously differentiable on G. 

(A5) There exists a constant C2 £ (0, 00) such that for every y £ y, x,x' G 
X,9eG 

\Ve logqe{x,x') \ , \Velogge {y\x) \ , \\/llogqe{x,x')\, 
iVglog^e (y|x) I < C2. 

Remark 1. In general assumptions (A3) and (A5) hold when the state 
space X is compact and when the conditional laws of the observed state given 
the hidden state are heavy tailed, see for example Section 7. However we 
expect that the behaviours predicted by Theorems 1, 2, 3 and 5 will provide 
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a good qualitative guide to the behaviour of ABC MLE in practice even in 
cases where the underlying HMMs do not satisfy these assumptions. 

Assumptions (A1)-(A5) are sufficient to show that in some sense the ABC 
MLE inherits the its asymptotic properties from the standard MLE. The 
Lipschitz assumptions below will be used to establish quantitative bounds 
on the relative performance of the ABC MLE estimator with respect to that 
of the MLE. 

(A6) There exists an L G (0, oo) such that for all, x£X,y,y'£y,9£Q 

\9eiy\x) - ge{y'\x)\ < L\y - y'\. 
(A7) There exists an L G (0, oo) such that for all, x£X,y,y'£y,9£Q 
I'^egeivlx) - Vggg{y'\x)\ < L\y - y'\. 
3. Approximate Bayesian Computation. 

3.1. Estimation Procedure. Following Jasra et al. (2010) we consider the 
ABC approximation to the likelihood of a sequence of observations Yi , . . . , 1^ 
for some fixed 9 £ Q given by, 



TT q9{xk-i,xk)lB'- {yk)ge{yk\xk) 



k=l 



■Ko{dxQ) ij,{dxi.,n)i^idyi;n) 



(3) 



The purpose of this paper is to analyse the asymptotic properties of the 
ABC parameter estimator for HMMs defined by 



Procedure 1 (ABC MLE). Given e > and data Yi, . . . , Yn, estimate 
with 



(4) 



91 = argmaxPe € 5^ , . . . , y„ G 5| 



The key to our analysis is the following observation which is, to our knowl- 
edge, original; 



Y[<le{xk-i,Xk)'S-B'^ {yk)9e{yk\xk) 



k=l 



7ro((ixo) ii{dxi;n)v{dy 



V.n 



(5) 



cx 



q0{xk^i,Xk)gl{Yk\xk) 



k=l 



TTo{dxo)fl{dxi;n) 
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where 



(6) 



V 




geiy'\x) v{dy) 



and where we note that by Lemma 7 the quantity in (6) is weh defined v 
a.s.. 

The crucial point is that the quantity (7g(y|x) defined in (6) is the density 
of the measure obtained by convolving the measure corresponding to ge{y\x) 
with Ub^ where the density is taken w.r.t. the new dominating measure 
obtained by convolving v with Ub^ ■ One can then immediately see that the 
quantities qe{x,x') and appearing in (5) are the transition kernels 

and conditional laws respectively for a perturbed HMM {X^-, }fc>o defined 
such that it is equal in law to the process 



where {Xfc,Yfc}^>Q is the original HMM and the {Zjt}^>Q are an i.i.d. se- 
quence of Uq\ distributed random variables. Crucially the constant of pro- 
portionality in (5), which by definition is equal to u I B'^ J x • • • x i/ ( 1 , 



is by Lemma 7 non-zero z^®" a.s. and is independent of the parameter value 
9. Thus it follows that (4) is statistically identical to the estimator 



where (• • • ) denotes the likelihood of the observations w.r.t. the law of the 
perturbed process {X^^Y^} ^.^q. The value of expressing the ABC estimator 
(4) in the mathematically equivalent form (8) is that (8) reveals the under- 
lying mathematical structure of the estimator and furthermore, as we shall 
see in the next section, expresses it in a form which is particularly tractable 
to analysis. 

We note that our observations (5) and (6) are similar in spirit to those 
made in Wilkinson (2008). However in that paper the author takes the point 
of view that the original collection of HMMs for which we are trying to 
perform inference is itself misspecified. 

3.2. Theoretical Results. It follows from the previous section that per- 
forming ABC MLE is equivalent to estimating the parameter by taking a 
data set generated by one of the original HMMs {X^, yfc}fc>o finding the 
value of 9 which maximises the likelihood of that data set under the corre- 
sponding perturbed HMM {Xk, Y^}^^^. Thus the ABC MLE estimator wih 



(7) 



{Xk,Yk + eZk} 





(8) 
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effectively suffer from the problem of model mis-specification. This raises 
the question of whether the resulting estimator will still be asymptotically 
consistent. As the following example shows one must expect that, in general, 
the answer to this question will be no. 

Example 1. For each 6 € [0, 1] let {^fc}fc>o a directly observed se- 
quence of i.i.d. random variables with common law 



and let 9* denote the true value of the model parameter. Then for any e > 
the ABC MLE will not be asymptotically consistent even though the MLE 
estimator is asymptotically consistent for any value of 6* . Furthermore for 
26* > e > 9* > the ABC approximation to the likelihood is maximized at 
9 = for any sequence of observations. 

Although the ABC MLE estimator is no longer asymptotically consistent 
we show the following below. Almost surely the ABC MLE will converge, 
with increasing sample size, to a given point in parameter space (more gener- 
ally the set of accumulation points will belong to a given subset of parameter 
space). Further, we show that these accumulation points must lie in some 
neighbourhood of the true parameter value and that the size of this neigh- 
bourhood shrinks to zero as e goes to zero (Theorem 2). Finally we show 
that under certain Lipschitz conditions one can obtain a rate for the de- 
crease in the size of these neighbourhoods (Theorem 3). We note that these 
results are very much misspecified MLE results in the spirit of, for example. 
White (1982). However because the dominating measures of the original and 
perturbed HMMs are no longer necessarily mutually absolutely continuous 
with respect to each other they can no longer be interpreted in terms of 
minimising Kullback-Leibler distances. 

Before we present our results we first discuss some technical issues that 
arise in their proofs. It is tempting to try and understand the behaviour 
of the ABC MLE by extending the parameter space to include e and 
then applying standard results from the theory of MLE. Unfortunately the 
existing theory of MLE requires that the perturbed likelihoods g'g(ylx) (see 
(6)) be continuous w.r.t. e which is not true for general dominating measures 
u. The essence of our method is show that despite this certain asymptotic 
quantities associated with the likelihoods of the perturbed processes (7) are 
still sufficiently continuous as functions of e. In order to do this we need 
to establish that in some probabalistic sense the order of the operations of 



Xk = 



9 



} w.p. 0.5 
9 w.p. 0.5 
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differentiating and taking asymptotic limits can be interchanged. It it this 
that constitutes the bulk of Appendix B. 

In order to state and prove these results it is convenient to make the 
following definitions. For any 9 € Q and e > 0, let 



where Pg{-\-) denotes the conditional laws of the observations of the per- 
turbed processes (7) given the infinite past and the expectations are taken 
with respect to the stationary measure of the unperturbed HMM with pa- 
rameter 9*. Further for e = we let 



Our first result shows that the ABC MLE is asymptotically biased 

Theorem 1. Assume (A2)-(A3). Then for every e > 0, supgge/^(6') is 
achieved. Further let 



he the set of these maximizers, then for any initial distribution vro we have 
that almost surely every accumulation point of the sequence of estimators 
9\, . . . defined in Procedure 1 belongs to . 

Proof. It follows from (A2) and (A3) that for the perturbed HMM de- 
fined in (7) the conditional laws pg(yi|y_„:o) are continuous w.r.t. 9. Further 
it follows from (A3) and (34) that the conditional laws pg(yi|y_„:o) converge 
uniformly to the conditional laws Pg(yi|y-oo:o) and are uniformly bounded, 
both above and away from zero. It then follows that the conditional log- 
likelihood functions logpg(yi|?/_„:o) are continuous, uniformly bounded and 
converge uniformly to logpg(?/i|y_oo:o) and hence that the expected values 
Eg* [logpg(Y'i|y_oo:o)] are also continuous functions of G 0. The first part 
of the theorem then follows from the compactness of ©. 

The second part of the result now follows from (A2) and (A3) by using 
the same arguments as used by Douc, Moulines and Ryden (2004) to prove 
the asymptotic consistency of the MLE. We leave it to the reader to check 
the details. □ 



(9) 



r(0) = Ee. [iogp^(yi|y_oo:o)] 



(10) 



(^) = /(0) = Ee* [logpe(n|l^-oo:o)] • 




Although Theorem 1 shows that the ABC MLE is asymptotically biased, 
the following result shows that this error can be made arbitrarily small by 
choosing a sufficiently small e. 
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Theorem 2. Assume (Al)-(A3). Then 



(11) 



lim sup|6'-r|=0. 



Remark 2. Theorems 1 and 2 provide a theoretical justification for the 
ABC MLE procedure analogous to that provided for the standard MLE pro- 
cedure by the classical notion of asymptotic consistency. In particular they 
show that an arbitrary degree of accuracy in the parameter estimate can be 
achieved given sufficient data and a sufficiently small e. 

In order to prove Theorem 2 we need the following Lemma whose proof 
is relegated to Appendix B. 

Lemma 1. Assume (A2)-(A3). Then the mapping {6,e) G 6 x [0,oo) — )• 
1^(0) is continuous in 9 and right continuous in e in the sense that for all 
pairs of sequences On ^ and en\ e we have that 



Proof of Theorem 2. In order to prove (11), given that by Lemma 
1 the mapping {9,e) G x [0,oo) — > l'^{9) — l'^{9*) is continuous, it is 
sufficient to show that for any 5 > there exists an e' > such that 
T*^ C -Bg* for all e < e'. Suppose that this property does not hold. Then, 
by the compactness of 0, there must exist 6 > and sequences e„ \ and 
9n-^9e{9' : \0' -9*\> 5} such that 



for all n. However it would then follow from the continuity of l^{9) — l'^{9*) 
that l{9) > l{9*) which violates (Al). (In Douc, Moulines and Ryden (2004) 
it is shown that under (A2) and (A3) that (Al) is equivalent to having that 



The next result shows that, under some additional assumptions, we can 
characterise the rate at which the asymptotic error in the ABC MLE de- 
creases with e. 

Theorem 3. Assume (Al)-(A7) and that the asymptotic Fisher infor- 
mation matrix 1(0*) is invertible. Then there exist finite positive constants 
C, e such that for all e <e 



l'"{9n)^l'{9). 



l'"{9n) -l'"{0*) > 



1(0*) > l{9) for all 9 / 9*.) 



□ 



sup \9-9*\ < Ce. 
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The proof of Theorem 3 relies on the following lemma whose proof is given 
in Appendix B. 

Lemma 2. Assume (Al)-(A7). Then Vgl'', VqI and V^l exist for all 
9 E G where G is as in (A4) and (A5). Furthermore 



(12) 



snv\Vel'{0)-Vel{0)\ < Re 



for some R > and Vi)l{9*) = I{9*). 

Proof of Theorem 3. Since by assumption I {9*) is invertible and thus 
positive definite it follows that there exists some T > such that 



(13) 



inf ' \ / ' > T. 

i):|i)|>0 \v\ 



By Lemma 2, l{9) is twice continuously differentiable on G and so there 
exists a constant (5 > such that 



(14) 



sup ||v^/(0) -/(r)|| < 

\e-e*\<6 



By Theorem 2 there exists a constant e > such that for all e < e, 



(15) 



sup 



< 5. 



Consider any 9^ G T'^ By Lemma 2 both 'Vgl^{9^) and V0l{9*) exist and 
clearly they must both be equal to zero and hence by (12) 



(16) 



Ve/(r 



< Re. 



Further by the fundamental theorem of calculus 
(17) VeKh = ^el{0*) + (^j^ (9* + t{9' 
By (13), (14) and (15) it now follows that 



dt (9' 



(18) 



Vll (9* + ti9' - 9*)) dt 



> 



T 



The result now follows from (16), (17) and (18). 



□ 
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Remark 3. In many cases the complete data sequence Yi,...,Yn is 
too high- dimensional and instead one performs inference using a summary 
statistic S{Yi, . . . , Yn) where S{- ■ ■) is some mapping from M"* x • • • to 
a lower dimensional Euclidean space, e.g. see Tavre et al. (1997). In gen- 
eral this mapping will destroy the Markovian structure of the data and the 
results derived in this section will not be applicable to ABC based parameter 
inference conducted using the corresponding summary statistic. 

However in practice it is often the case that the mapping S{- • • ) is of the 
form S{Yi, . . . , Yn) = S{Yi), . . . , S{Yn) for some function S{-) that maps 
from to a space of lower dimension. When this is true it is easy to 
see that the Markovian structure of the data is preserved. Moreover suppose 
that assumptions (Al)-(A7) hold for the underlying HMM. If the mapping 
S{-) preserves the identifiability of the system, that is to say if assumption 
(Al) also holds for the HMMs with observations S{Yi), S{Y2), . . ., then it 
is trivial to see that assumptions (A2)-(A7) will also be preserved for all 
reasonable choices of S{-) and thus that Theorems 1, 2 and 3 will also hold 
for ABC MLE performed using the summary statistic. 

4. Noisy Approximate Bayesian Computation. 

4.1. Estimation Procedure. In the previous section we showed that per- 
forming ABC MLE is equivalent to estimating the parameter by choos- 
ing the value of the maximizer of the likelihoods of the perturbed HMMs 
{Xk,Y^}j^-^Q defined in (7). Since the likelihoods over which we maximise 
are misspecified with respect to the law of the process that is generating the 
data the resulting estimator has an inherent asymptotic bias. 

Suppose now that a sequence of observations Fi, . . . , y„ from the unper- 
turbed HMM corresponding to some 0* G is given. The sequence of noisy 
observations Yi + eZi, . . . , y„ + eZn where ^b^^ k>l has the same 

law as a sample from the corresponding perturbed HMM defined in (7). As 
a result estimating 9* by applying the ABC MLE estimator (4) to the noisy 
observations Yi + eZi, . . . , y„ + eZ„ in place of Yi, . . . , Yn, is statistically 
equivalent to estimating 0* by applying standard MLE to the perturbed 
HMMs (7). Clearly one would expect that the resulting estimator would in- 
herit the properties of MLE, in particular that it would be asymptotically 
consistent. In light of the discussion and remarks immediately following the 
definition of Procedure 1 these observations lead one to the following noisy 
ABC MLE procedure: 

Procedure 2 (Noisy ABC MLE). Given e > and data Yi, . . . ,Yn 
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estimate 9* with 

(19) ^1 = argsupP, {y{ G 5^^^^^^, . . . ,y„^ E B^^^^^) . 

Remark 4. Procedure 2 is a likelihood-based version of the noisy ABC 
method in Fearnhead and Prangle (2010). 

4.2. Theoretical Results. In this section we investigate mathematically 
the noisy ABC MLE procedure defined in Section 4.1. In particular we show 
that under the assumptions made in Section 2.2 that the noisy ABC MLE 
inherits the properties of asymptotic consistency and normality from the 
MLE. Further we provide an analysis of the performance of the noisy ABC 
MLE relative to the standard MLE by comparing their asymptotic variances. 
It is first shown that the asymptotic Fisher information of the ABC MLE is 
strictly less than that of the MLE and hence that the asymptotic variance 
of the ABC MLE estimator is strictly greater. Thus it follows that the noisy 
ABC MLE procedure comes at the cost of a loss in accuracy relative to that 
of the standard ABC procedure. Finally we show that this loss in accuracy 
can be made arbitrarily small by choosing e small enough. 

The first result establishes that under (Al)-(A3) the noisy ABC MLE 
inherits the property of asymptotic consistency. 

Theorem 4. Assume (A1)-(A3). Then Procedure 2 is asymptotically 
consistent. 

Proof. It is sufficient to show that if (Al)-(A3) hold for the original 
HMM then they also hold for the perturbed HMM. Recall, for the perturbed 
HMM, the transitions are as for the original HMM and the likelihood is 
as (6). Thus (A3) for the original model immediately implies (A3) for the 
perturbed model. 

In order to establish that (A2) holds for the perturbed model it is sufficient 
to observe that continuity of the mapping — )• (7g(y|x) for any x ^ X , y ^ y 
follows from continuity of the mapping 6 — )• gQ{y\x), uniform boundedness 
of ge{y\x) (ie. (A3)) and the dominated convergence theorem. 

It remains to show that (Al) is also inherited by the perturbed model. 
This assumption is equivalent to demanding that for every 9' ^ 9 there 
exists some r such that 

(20) Ce{Yi,...,Yr)^Ce^{Yi,...,Yr) 

where Cq (•) denotes the law of the process {Y/cj^^Q. However by applying 
Lemma 6 it immediately follows that (20) holds if and only if 

Ce {Yl,...,Y,^)^Ce^ (n^•••,^;) 
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for all e and so (Al) holds for the original HMMs if and only if it also holds 
for the perturbed HMMs. □ 

Next we consider the question of asymptotic normality. In Douc, Moulines 
and Ryden (2004) it was shown that under conditions (A1)-(A5) the MLE 
for HMMs has asymptotic Fisher information matrix I (6*) where 



ooiOi 



Further it was shown that if I {9*) is invertible then the MLE is asymp- 
totically normal with asymptotic variance equal to I{6*)~^. It follows from 
the proof of Theorem 4 that if (Al)-(A3) hold for the original HMM then 
they also hold for the perturbed HMM. Further if (A4) and (A5) hold for 
the original HMM then a simple application of the dominated convergence 
theorem shows that they also hold for the perturbed HMM. Thus, under as- 
sumptions (Al)-(A5) the asymptotic Fisher information matrix of the noisy 
ABC MLE exists and is equal to r{9*) where 



r(0*) = Ef), 



Vglogpl, {Y,^\Yl^.,o) Velogpl, {Y{\Yl^.^,f 



Moreover if P{6*) is invertible then the noisy ABC MLE estimator will be 
asymptotically normal with asymptotic variance equal to P{9*)~^. Using 
these results we can analyze the asymptotic performance of the noisy ABC 
MLE estimator relative to that of the standard MLE estimator by comparing 
the two Fisher information matrices. Unfortunately one cannot in general 
make any explicit quantitative comparisons between these two quantities, 
however the following result establishes some qualitative relations between 
the two. 



Theorem 5. Assume (A1)-(A5). Then: 

1. I{9*) > ^{O*). Further if v is connected and I{9*) / (see Section 2.1) 
then the inequality is strict. 

2. P{9*) ^ as e ^ oo. 

3. P{9*) — )• I{9*) as e — )• 0. Hence for epsilon sufficiently small the ABC 
MLE is asymptotically normal with asymptotic variance equal to P{9*)~^ . 

4. If (A6) and (A7) hold then \\I{9*) - F{9*)\\ = 0(e2). 

Theorem 5 tells us that asymptotic variance of the noisy ABC MLE es- 
timator is strictly greater than that of the MLE estimator and hence that 
there is a loss in accuracy relative to the MLE in using noisy ABC MLE. For 
very large values of e the asymptotic variance of the noisy ABC MLE grows 
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without bound and the loss in accuracy becomes almost complete. Thus if 
one chooses values of e which are too large the noisy ABC MLE becomes 
ineffective. Furthermore we have shown that by taking small enough values 
of e the loss in accuracy can be be made arbitrarily small and hence that 
we can obtain (ignoring computational issues) a performance of the noisy 
ABC MLE arbitrarily close to that of the MLE. Finally, the theorem pro- 
vides a rate of convergence for the Fisher information matricies for when 
the likelihoods obey certain simple Lipschitz assumptions. 

The proof of Theorem 5 is based on the following lemma, see Appendix 
B for the proof. 



Lemma 3. Assume (Al)-(A5). Then 



i{e*) = r{e*) + Ee* 



oo : — 1 i ■ 



where for every doubly infinite sequence y_oo:-i; ^I'-oo random variable 
/y_'^°_^.ye (9*) is equal to the difference in the Fisher informations of the 
conditional laws ofYo and Yq given y_oo:-i; 5^i:oo; ^^^^ 



Velogpe* (yo|>^-oo:-i;ir:c 



Ve logpe* (yo|i^-oo:-i; n-oo) l>^-oo:-i; l^i 



l:oo 



Vglogpe* {Y^\Y^ 



i;>^i-oo) 



Velogpe. {Y^\Y. 



^r,Yl,J^\Y. 



Remark 5. The quantity ly ° ^.ye (0*) is also equal to the missing 
information in the conditional law of Yq relative to that in the conditional 
law ofYf) (where both laws are conditioned on Y-oo:-i',Yioo)- Here the term 
missing information is meant in the sense of that proposed for i.i.d. random 
variables in Orchard and Woodbury (1972). Hence, Lemma 3 can be consid- 
ered as a conditional asymptotic missing information principle for HMMs 
with observations perturbed by uniform additive noise. 



Theorem 5 is then an immediate corollary of the following lemma which 



establishes the behaviour of L 



;y/. 



{9*) for different values of e. 
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Lemma 4. Assume (Al)-(A5). Then: 



1. Efl. 



' ^ .ye (9^) is positive semi-definite. Further if u is connected 

1 —oo -. — l 1-^ l:oo 



and 1(9*) / then Eg, 
2. Ef), 



^ for any e > 0. 



3. Efl. 



Iy''-'^\.Y^ (9*) 



yo--Yo' 



.ye 



(9*) 



I{9*) as e —> oo. 
as e — ^ 0. 



Assume that (A6) and (A7) also hold. Then 



tYo-.Yo^ 



r) 



The proof of Lemma 4 is again deferred to Appendix B. 

Remark 6. Comments similar to those in Remark 3 concerning sum- 
mary statistics also hold for the results on the noisy ABC MLE given in this 
section. In particular we note that given a summary statistic of the form 
S{Yi), . . . , S{Yn) one can derive a result analogous to Theorem 5 in which 
the Fisher information matrices I (9*) andP{9*) are replaced with the Fisher 
information matrices for the HMMs S{Yi)., . . . and S{Yi) + eZi, . . . where 
5(11) + eZi, ... is a perturbed version of SiYi)., . . . defined in an analogous 
manner to (7). 

5. Smoothed ABC. ABC estimators based on Procedures 1 and 2 
have an inherent lack of smoothness due to the fact that the estimator ef- 
fectively gives weight one to points inside the balls , ■ ■ ■ , B^ and weight 
zero to those outside them. As seen in the next section, this becomes par- 
ticularly problematic if one then tries to estimate these probabilities using 
SMC algorithms as the algorithm can collapse due to the use of indicator 
functions; see Del Moral, Doucet and Jasra (2008a) for some discussion. 

A common way of smoothing ABC, see for example Beaumont, Zhang and 
Balding (2002), is to approximate the likelihoods of a sequence of observa- 
tions Yi, . . . ,Yn not with (3) but instead with the smoothed approximations 



Ef) 



Yi-Y, 



Y —Y 



[ T\qg{xk^i,xk)(l)i— — —)ge{yk\xk) 



■Ko{dxQ) ^x{dxi:n)v{dy 



l:n 



(21) 
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where (/>(•) is the density w.r.t. Lebesgue measure of some smooth probabihty 
distribution <I>. One then estimates the parameters via maximising (21). 

By using exactly the same arguments as in Section 3.1 it is clear that the 
smoothed ABC MLE estimator resulting from approximating the hkelihoods 
of a sequence of observations Yi, . . . ,1^ with (21) for some suitable kernel 
(j) is statistically equivalent to estimator obtained by by approximating the 
true likelihoods with the likelihoods of the perturbed HMM defined to be 

(22) jXfc, Y^A := {Xfc, Yk + eZk]k>G 

where the {Z^j^^Q are such that Further, in an analogous manner 

to Section 4.1 one can define a smoothed noisy ABC MLE by applying 
the smoothed ABC MLE defined above to noisy data of the form li + 

eZi, . . . , y„ + eZn where again Z\, 

It is natural to ask whether results analogous to Theorems 1, 2, 3, 4 and 
5 hold for the smoothed ABC MLE and the smoothed noisy ABC MLE. By 
a careful reading of the proofs of these theorems one can see that analogous 
results hold when the density of <1> satisfies the following conditions: 

(i) (/>(y) > for all y € W\ 

(ii) </>(•) is continuously differentiable. 

(iii) for the reference 

lim -, = t(V) V a.s.. 

(iv) J x'^(j){x)dx < oo. 

We observe that these conditions hold for many commonly used smoothing 
distributions, in particular the Gaussian distribution. 

Finally it is noted that comments analogous to those in Remarks 3 and 6 
hold for the smoothed ABC MLE and smoothed noisy ABC MLE. Moreover 
the quantities (21) can be straight-forwardly estimated using SMC tech- 
niques, see the following section for more details. 

6. Implementing ABC via SMC. SMC algorithms are commonly 
used to approximate conditional laws of the form p{Xk\Yi-k) (we drop the 
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Yfc notation and omit dependence upon 6 here). At each time k the condi- 
tional law of the hidden state is approximated by a collection of N particles, 

1 N 

as 

1 ^ 

(23) pm..,) = -Y.^<(-)- 

1=1 

The crucial feature of the SMC algorithm with respect to any form of like- 
lihood based parameter inference is that at each step, Z^/li ^(^fcl^^i)) is 
an approximation to the conditional likelihood p(Yk\Yi-k-i) . Thus when the 
conditional likelihoods g{-\-) are tractable SMC algorithms can be used to 
generate approximations to the full likelihoods p(Yi, . . . ,Yn), e.g. see An- 
drieu, Doucet and Tadic (2009) for the use of SMC for MLE in this standard 
setting. 

Consider now the ABC MLE and noisy ABC MLE procedures defined in 
Sections 3 and 4 and recall that we approximate the true likelihoods with the 
likelihoods of the perturbed HMMs (7). To see how standard SMC methods 
can be implemented in the context of these estimators consider the extended 
process {Xk,Yk,Y^}^yQ defined such that {Xk,Yk}i^yQ are the hidden state 
and observation process of the original HMM and for all k >0, Y^ = Yf^+eZi^ 
where {Zfc}^>Q is an i.i.d. sequence of U^i random variables. Clearly the 
marginal distributions of the observations of the extended process are equal 
to those of the observations of the perturbed HMMs defined in (7). Thus in 
order to compute the ABC approximation to the likelihood of a sequence 
of observations Yi, . . . ,Yn it is sufficient to compute the likelihood of the 
observations under the extended HMM detailed above. Since the conditional 
densities of the observed state given the hidden state of the extended HMM 
are trivial the corresponding likelihoods may be computed using standard 
SMC. This suggests the following SMC algorithm for evaluating the ABC 
approximate likelihoods (3), see Jasra et al. (2010) 

Algorithm 1. SMC for Computation of Approximate Bayesian Likeli- 

hood p%Yi,...,Yn)- 

For k = 1, . . . ,n do 

1. Generate proposal states {x\, y\), . . . , {x^ , ) where each x\ ~ (l{x\_i^ •) 
and each y\ ~ g{-\x^k)- 

2. Weight each proposed state (5;{-,y[) with wl, = I^i (yi)- 

3. Renormalise the weights; w[. i— )■ w^. := w\/Yld=i ^L- 
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4. Generate the particles x\,...,x^ by sampling multinomially from the 
proposals x^, . . . , according to the weights w\, . . . , . 

Finally approximate the likelihood p'^(Yi, . . . , y„) by YVk=i (^Jj "12^=1 • 

Similarly, given a distribution $ with smooth density (p w.r.t. Lebesgue 
measure, one can define a SMC algorithm for computing the corresponding 
smoothed ABC approximations to the likelihoods in an analogous manner; 
the details follow from Algorithm 2. 

Note that in general one does not have to resample the particles at ev- 
ery step and more efficient approaches may be possible, see for example 
Del Moral, Doucet and Jasra (2008b) and the references therein. A detailed 
analysis of the SMC method, including description of resampling and con- 
vergence results can be found in Doucet, De Freitas and Gordon (2001) and 
Del Moral (2004). 

7. Numerical Example. It is common in economics to model the log 
returns of a sequence of price data using a HMM. Typically one uses the 
hidden state to model certain underlying economic factors which cannot 
be directly observed and the observed state to model the log returns of the 
prices themselves. Furthermore it has become increasingly common to model 
the distribution of the log returns of asset prices using a-stable distributions 
due to their seemingly good fit to the actual data, see for example Rachev 
and Mittnik (2000). Unfortunately the likelihoods of a-stable distributions 
are intractable and so using them presents difficulties when trying to infer 
model parameters from real financial data. 

In this section we study the performance of both the standard and noisy 
ABC MLE procedures when used to estimate the scale parameter of the 
following toy economic model with intractable likelihoods. The hidden state 
{X}^>Q takes values in the set {—1, 1} and the corresponding Markov chain 
has transition matrix 

f f f V 

V 5 5 / 

Conditional on the hidden state the observed state ~ ^^(cr, 0,Xfc + 5) 
where Sa{cr,(3,6) denotes the a-stable distributions with parameters a,a,f3 
and 6, see for example Samorodnitsky and Taqqu (1994). Intuitively the 
hidden state denotes the health of the underlying economy, being good 
ie. growth and —1 being bad ie. recession. Given the state of the economy 
the log returns of the relevant asset price are then a-stable distributed with 
a positive or negative drift as appropriate. 
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Fig 1: Asymptotic bias of ABC MLE parameter estimates. 



In Figure 1 we plot the asymptotic bias of the ABC MLE when used to 
estimate the parameters a and 6 given that the true model parameters are 
a = 1.8, £7 = 1 and 5 = 0. We note that the ABC MLE seems to induce a bias 
in the estimates of the scale parameters but not of the location parameters. 
Intuitively this can be understood as being due to the fact that the observed 
states of the perturbed HMMs (7) have a greater variance than those of the 
corresponding original HMMs but the same mean position. Lastly we note 
that for very small e the size of the bias seems to be O(e^) ie. one order of 
magnitude less than the upper bound obtained in Theorem 3. 

Finally we investigate the behaviour of the noisy ABC MLE. In the first 
graph in Figure 2 we plot the Fisher information matrix as a function of 
e. The data suggests that for small e the loss of information is O(e^). In 
the second graph we plot the log of the inverse of the Fisher information as 
a function of log e. In this case the resulting data suggests that the Fisher 
information in the noisy ABC MLE decays as the inverse of the fourth power 
of e for sufficiently large values of e. 

This second plot indicates that the Fisher information in the noisy ABC 
MLE decays as the fourth power of e, at least for large values of e. This 
suggests that in order for ABC MLE to provide accurate parameter estimates 
one must use relatively small values of e. However this conflicts with the need 
to keep e reasonably large in order to achieve computational stability, we 
note that even in this simple 1-D linear model we had to use large numbers 
of particles in our SMC algorithms to obtain accurate estimates of the ABC 
likelihoods for small values of e. In higher dimensions this problem will be 
even worse as the volumes of the e balls around the observations will decay 
even more quickly with e than in the one dimensional case. This suggests 
that in order for ABC to become a truly practical statistical method one 
needs to find algorithms that can generate samples from arbitrarily small 
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Fig 2: Fisher Information of noisy ABC MLE parameter estimates. 



neighborhoods of a point in an efficient manner. One way in which this may 
be done is to marry ABC with techniques from the rapidly growing field of 
rare event simulation (see Rubino and Tuffin (2009) for a recent overview of 
this area). 

8. Summary. In this article we have investigated the behaviour of the 
ABC and noisy ABC MLEs when used for estimating the parameters of 
HMMs. We have shown that mathematically these estimators should both be 
understood as being MLEs implemented using the likelihoods of a collection 
of perturbed HMMs. Using this insight we have shown that the standard 
ABC MLE has an innate asymptotic bias which can be made arbitrarily 
small by choosing a sufficiently small value of the parameter e. Further we 
have shown that the noisy ABC MLE provides an asymptotically consistent 
estimator which is also, under certain conditions analogous to those for the 
MLE, asymptotically normal. Moreover this noisy version of the estimator 
has a loss of information relative to the MLE which manifests itself via an 
increase in the variance of the parameter estimates. Finally we have shown 
that under very mild conditions these results can be extended to smoothed 
versions of the standard and noisy ABC MLEs. 

These theoretical results help to solidify and extend existing intuition 
associated to the approximations that have been considered. Further they 
suggest some possible avenues for future investigation. Firstly one would ex- 
pect that the theoretical results in this paper will hold under much weaker 
assumptions than those presented here. The question of finding the neces- 
sary mathematical tools to relax these assumptions remains an interesting 
and important open problem. Secondly, the numerical results suggest that 
in order to provide an efficient and accurate method of parameter estimation 
ABC MLE will in practice need to be combined with computational tech- 
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niques that allow one to generate samples effectively from sets with very 
small probabilities. The question of finding a generally applicable method 
of doing this is the topic of our current research. 

Appendix A: Auxiliary Results. Here we present some supporting 
technical lemmas. The first result is a standard result from real analysis 
which we state without proof. 

Lemma 5. Suppose that there exists a function f : ^ W and se- 
quence of continuously differentiable functions fn : — t- R", n > 1, such 
that fn{z)-,^ efn{z) o,fG hounded uniformly in n and z, fn{z) — )■ f{z) uni- 
formly in z and the sequence Vefniz) is Cauchy uniformly in z. Then f 
is itself uniformly bounded and continuously differentiable and V0f{z) = 
lim„_^oo ^efniz) uniformly in z. 

The second lemma is concerned with the identifiability of probability dis- 
tributions under additive noise. 

Lemma 6. Let distributions //i,//2 and v on M*" for some m > 1 be 
given and suppose that the characteristic function of v is equal to zero on a 
set of Lebesgue measure zero. Then 



Proof. For any distribution n we shall let (ff^iX) denote the correspond- 
ing characteristic function. It is well known that for any pair of random 
variables fx and v, f^^uW = ^fiWy^uW and that fj, = u ii and only if 
= ^i^W for all A. Thus we have that 



The following three Lemmas are well known results concerned with the 
connectedness of the support of a measure. We state them without proof. 

Lemma 7. Let a probability distribution fj, on M"* for some m > 1 be 
given. Then for all e > the set 





^fiiWfuW = 'Pfi2Wf'yW for all A 



□ 



F, 



[yeW^: [Y e Bl) = 0} 
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is measurable and 

Lemma 8. // the support of /i is connected then so is the support of the 
n-fold product measure /z®"" for any n > 1. 

Lemma 9. Suppose that the support of a probability measure fi on M*" is 
connected (see Section 2.1), then so is the support of the probability measure 
H * Ub^ for any e > 0. 

The next lemma shows that adding noise to an observation will, in general, 
result in a loss of information. The lemma after shows that for very large 
amounts of noise the loss in information will be almost complete. 

Lemma 10. Suppose that there exists a collection of distributions Fq on 
some y C parameterised by 9 G Q and with densities po (•) with respect 
to some common finite dominating measure fi, and that the densities pe {■) 
are differentiable w.r.t. 9. For all 9 G Q and e > letFg = *Ub^- Then 
for any 9 €z Q and e > 

(24) Ep, [Velogpe{Y).Velogpe{Yf] > Kp. [V e log pl{Y). Ve log pi {Yf] 

where Pq{-) denotes the density of the distribution ¥g with respect to the 
finite dominating measure fi * Usf^ ■ Furthermore, if the supports of the dis- 
tributions ¥q are all connected then we have equality in (24) if and only if 
both quantities are equal to the zero matrix. 

Proof. Let 9 £ Q he given and let 1" be a random variable distributed 
according to pe{-). Observe that given e the quantity Pq{-) is equal to the 
density of the random variable Y"" = Y + eZ (with respect to the appro- 
priate dominating measure) where Z is an independent random variable 
and Z ~ U^i . By a straightforward application of the Fisher identity and 
the fact that p0{Y,Y^) = pg{Y)lB,iy^ - Y) one has that Velogp^iY') = 
E['^elogpe{Y,Y')\Y'] = E [V g log pe{Y)\Y'] a.s. where pei-,-) denotes the 
joint density of the random variables Y, Y'^ from which it follows that for 
any v S M™", f'^Vg logpg(y^) = E [w-^Vg logp6'(^)|^"^] • Furthermore given 
V G we have that 
(25) 

v^Ep^ [Velogpe{Y).Ve log pe{Yf] v = Ep, [v^V0logpe{Y).Velogpe{Yfv] , 
^;^Ep| [VelogpUY).VelogpUYf] v = Ep. [v^ V e log pUy) -"^e log fe{Yfv] . 
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Applying Jensen's inequality to (25) yields 



v^Ep, [VelogP0iY).VelogpeiYf] v > v^E^c^ [Vg logp^(y).Ve logp^(y)^] v 



for all V G M™" from which (24) immediately follows. 

We now prove the second assertion. Since the mapping z € M — )• 
is strictly convex it further follows from Jensen's inequality that for any 



v'^Ep^ [VglogpeiY).VelogpeiYf] v = T;^Ep. [V^ logp^(y).Ve logp^(y)^] v 



if and only if v'^Vg logp6i(y) and hence v'^Vq logp0{Y, Y*^) is a (Y'^) measur- 
able. Thus equality holds in (24) if and only if v'^'Vg logp0(y, y^) is a [Y'') 
measurable for all v £ which holds if and only if 'Vglogp9{Y,Y'') is 
a (y^) measurable. Hence in order to prove the final part of the result 
it is sufficient to show that logpg(y, y^) is cr (Y^) measurable if and 
only if it is equal to zero a.s. Assume that Vglogpe{Y,Y'^) is cr (Y^) mea- 
surable. Then \7glogpg{Y'^) = Vg log pg{Y,Y'^) a.s.. Using the fact that 
Vg\ogpg{y,y^) = Vglogpg{y)lBSy^ " v) one then has that 



for ¥g a.s. all y,y' such that \y — y'\ < 2e. 

Suppose now that Vglogpg{Y) is not Fg a.s. constant. Then there must 
exist V and rj such that Fg{\Vg log pg(Y) — v\ < r]),Fg{\Vglogpg{Y) — v\ > 
r/) > 0. It then follows from Lemma 7 that there must exist points y and y 
such that for all 5 > 



Since the support of Fg is connected there exists a continuous curve C : 
[0, 1] — >• contained in the support of Fg such that C(0) = y and C(0) = y. 
By the continuity of C one can find a finite sequence of open balls B", . . . , B° 
of radius less than or equal to e such that y G B°, y £ B°, C C U^^^B^ 
and such that for every 1 < k < n, B^ Ci B^^^ n C 7^ 0. Consider any two 
neighbouring balls B'^ and B'^_^_^. From the above we have that Vg log pg{-) is 
Fg a.s. constant on B'^ and B'ji^-^ and that there exists some ball contained in 
^k'^^k+i 'with non zero Fg mass and thus that Vg logpg{-) is Fg a.s. constant 
on B^ U B^_^^. Hence it follows that Vglogpg{-) is Fg a.s. constant U^^^B^ 
which contradicts the assumption that 'Vglogpg{Y) is not Fg a.s. constant. 
Thus it follows that if Vg logpg(Y, Y"^) is a {Y'^) measurable that Vg logpg{-) 



(26) 



Vg logpg{y) = Vg logpg{y') 



(27) 



Fg{\Y -y\<d, \Vg logpgiY) -v\<7])>0, 
Fg{\Y -y\<6, \Ve logpg{Y) -v\<r])>0. 
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must be Fg a.s. equal to some constant K. Further, since E [Vg log pg(Y)] = 
it then follows that K = 0. Conversely if Vg logpg{-) = a.s. then clearly it 
is a (Y^) measurable. □ 

Lemma 11. Suppose that there exists a collection of distributions Fg on 
some y C parameterised by the parameter vector 6 £ Q. Assume that 
for every 9 the corresponding distribution has a density pg (•) with respect 
to some common finite dominating measure that the densities pg (•) are 
continuously differentiate w.r.t. 9 and that the corresponding score functions 
Vglogpg (•) are uniformly bounded above in norm by some some K < oo. 
For all 9 and e let Fg = Fg^lis^ ■ Then for any 9 and any sequence of positive 
real numbers e„ such that e„ oo 

lim P^" {{y : |Velogp^"(y)| > 5}) = 

for all 5 > where denotes the density of the distribution Fq" with 

respect to the finite dominating measure fi * U^^n. . 

Proof. Let 9 £ Q he given and let y be a random variable distributed 
according to Fg. As in the proof of Lemma 10 we observe that given e the 
quantity Pg{-) is equal to the density of the random variable = Y + eZ 
(again with respect to the appropriate dominating measure) where Z is an 
independent random variable with Z ~ U^i . Standard computations show 
that for any y 

V ]n^n'r.(^,, _ ^eIP0{z)lB'n {y-z)u{dz) 
J pg{z)lB<^n [y - z) u{dz) 

"^ejpeiz) (l - I(B.„)C (y - z)^ v{dz) 

J pg{z)lB'^u {y - z) v{dz) 
_- J Vepe(^;)I[(s^n)C {y - z) u{dz) 



f pe(z)lB<n {y - z) u{dz) 

where the last equality follows from the dominated convergence theorem 
by (A2), (A3), (A4) and (A5). Since \Vglogpg{y)\ < K it follows that 

< KFg (y" G (-B^")*^) for all y. Hence the 
ish that for any 6' 



f Vepe(^)I(B.„)c {y - z) v{dz) 
proof will follow once we estab' 



(28) lim sup Pg" [{y ■.Fe[Y £ B'^") < 1 - 6'}) < 6'. 

n—^oo 

Note that given any 6' there exist R < oo and r < 1 such that Fg (Y G j^q 
F{Z £ Bl) > 1 - 5' /2 and thus that for any e„ > 2R/ (1 - r) we have 
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that 



> 1-S'. Clearly if y G 



en-R 



{Y £ B'y") > Fe {Y G B^) > 1 - 6'/2 and so the result follows. 



then 

□ 



The following result establishes a stability-like property of the filter as the 
amount of noise in certain components of the observations becomes infinite. 
Before we state the result we recall the extended HMM defined in Section 6. 
Given a HMM {Xk,Yk}i^yQ and a perturbed version {Xk,Yl^}^^Q (see (7)) 
we define the extended HMM to be the joint process {X^, Y^, y^*^}^^Q. In 
other words given a HMM {X^, Yfc}^>Q and some e > the extended HMM 
is the process 



(29) 



fc>0 



where {Z^j^^Q is such that for each k > 0, 



i.i.d. 



Lemma 12. Let {Xfc,yfc}^>Q be a HMM which satisfies (A3) and let 
{Xfc, Yfc, y^*^}^^Q be the corresponding extended HMM defined in (29). Then 

foranyl < m, sequences ji < ■ ■ ■ < jr, ji < ■ ■ ■ < js, any j < min ji}, 
X G X and 5 > 



lim ] 



(30) 



PyXi-rr^lYj^ ] Yj^ ] Xj - 

-p(Xl:m\Yjy.j/,Xj = X 



TV 



> 5 



0. 



Proof. Clearly we can assume that {ji, . . . , jV} n {ji, . . . ,js} = 0. Let 
k = max jm, jr,js}, then using assumption (A3) and the well known identity 



X 



P [Xtm I Yj^ :jv. ; Yj^ -.^ ; Xj 
I Ut=j+i q{xu-i,xu) ]Yv=i 9(Yj^ \xjJUlu=i 9%yjjxjjdn{xj+u.i.:m+i:k) 



I Ut=j+i qixu-i,Xu) ]X=i 9iYj^ \xjJUl=i 9'iyjjxjjdti{xj+i.,, 



(31) 



where g^{-\-) is as in (6) it follows that in order to show (30) it is sufficient 
to show that for any I and 6 > 



(32) 



lim P sup 



gHYi\xl_^ 

9'{Yi'\x') 



>6 \ = 0. 
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In order to prove (32) it is sufficient, by assumption (A3), to show that for 
any 6 > 

Ib- 9{y'\x)v{dy') 



(33) 



hm V * UBi. y ■ sup 



x,x'ex 



Ib^ 9{y'\x')i'{dy') 



> 6 



0. 



By assumption (A3) we have that for any 6' > there exists some Rs' < oo 
such that for all x € X 



g{y\x)u{dy) < 6'. 



It then fo 
suPxi.'eA' 



lows that given the above 6 there exists some Rs < oo such that 

/sf, aiy'l^Mdy') 



Jj,, g{y'\x'Mdy') 



1 



< 6 for ah y such that B^^ C By. Thus in 

order to prove (33) it is sufficient to show that for any R > 0, hm^-s^oo * 
Ubi ({B'q-^)'^) = 0. However for any r G (0, 1) we have that 



hm sup u * Ub- ( {Bl'^f\ < hm sup u * Ub- ({B^^ ''^'f 



<nmsup(z.((SS^)^) +UBf^ m 



from which the result follows. 



□ 



The next five results are restatements of certain well-known stability prop- 
erties of the filter. 

Lemma 13. Let {Xk^Y^} be a HMM which satisfies (A3) and let the 
process {Xk,Yk,Y^} he the corresponding extended HMM defined as in (29). 
Then for all k < I < m < n, ji < ■ ■ ■ < jr and ji < • • • < jg such 
that ji A ji > k, jr y js < n, all G X and all sequences 

Y Y ■ Y^ 



(34) 
and 



■^l:'m\Yj-i.j. ,Yj^^, Xj^ — Xfc 



^i.m.\Yj-^.^;Yj^^; Xk — Xk;X- 



Xi:m\Yjj^.^; Yj^ ^; Xk — x'^ 



TV 



n — ■^n 



(35) 

where p = (l — cf /cf) . 
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Proof. Equations (34) and (36) follow immediately from standard re- 
sults in the literature, see for example Del Moral (2004) and Cappe, Ryden 
and Moulines (2005). □ 

Corollary 1. Let {Xk,Yk} be a HMM which satisfies (A3) and let 
the process {Xk,Yk,Y^} be the corresponding extended HMM defined as in 
(29). Then for all I < m and infinite sequences ■ ■ ■ ,j~i,jo md jo,ji, ■ ■ ■ the 
conditional probability laws p (^Xirn\Yj_^.Q^ and p (^Xi.„i\Yj_^.^^]Y-^^ ^ exist 
and are well defined. Further for any x G X 

(36) \\F(^X,.^\Y,_,.,;Ylj,X.k = x) - P Y,_„; 1^^^ ) ^ 0, 



TV 

(37) ||P (X,^|y,_,^„; = x)-F {X,JY,_.,) \\^^ ^ 

as k,n ^ oo. 

Proof. Equations (36) and (37) are simple consequences of (34). □ 

Corollary 2. Let {Xk,Yk} be a HMM which satisfies (A3) and let 
{Xk,Yk,Yj^} be the corresponding extended HMM defined as in (29). Then 
for all k < I, ji < ■ ■ ■ < jr and ji < ■ ■ ■ < js such that ji A ji > k, all x £ X 
and all sequences Yj^^, . . . , Yj^ -jYf , . . . ,Y-^ 



n 



C3 



(38) ^<p{xi\Y,,.^-YljXk = x)< 



r2 



where the constants Ci,ci are as in (A3) and the central quantity in (38) 
denotes the density of the corresponding conditional probability with respect 
to the dominating measure /i. 

Proof. To simplify the exposition we shall only give a proof of (38) for 
conditional probabilities of the form p(x;| Yj^.^), the proof in the general case 
following in an identical manner. 

It is clear by (A3) that when jV < I 

(39) Ci < p{xi\Yj^.^] Xk = x) <ci 

Consider the case when jV ^ I- Let r' be such that jr-'-i < I ^ jr'- By (A3) 
we have ^ 

p{Y,^,jXi = xi)<%p{Y,^,jX'i=x[) 
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for any x'l . Note that if / < jV' one obtains the tighter bound piYj , \Xi 
xi) < {ci/c,)p{Yj^,.„. \X'i = x[). Thus 



_ P{xi\Yj^,^,_^;Xk = x)p{Yj^,jXi = xi) 
IPi^'i\Yj^.,,,_,;Xk = x)p{Yj^,jXi = x'i)n{dx[) 

< p{xi\Yj_^,^,__^;Xk =x)^ 



and the upper bound in (38) is obtained using (39). The lower bound in (38) 
is proved similarly. □ 

Corollary 3. Let {Xk, Yk} be a HMM which satisfies (A3) and and let 
{Xfc, y^, y^*^} be the corresponding extended HMM defined as in (29). Then 
for all k < I < I' < m < m' , ji < ■ ■ ■ < jr and ji < ■ ■ ■ < js such that 
ji A ji > k, and all f,h£ L^o, x G X 



E 
(40) 



fiX,M)\Yj,.,^;Yf ■,Xk = x .E hiX^.,^,)\Yj,.,^;Yf -Xt 

Jl:s J [_ Jl:s 



-E 



< 



f{Xi.r)h{Xrn:„,')\Yj,./,YljXk = X 

where p is as in Lemma 13. 
Proof. Let 

Ai7 = /i(x^,^o [/i(x„:^0|5G-i:.;>^;^j^fc = X 

It follows from (34) that 



m-V 



OO II " II oo 



E 



^H\Yj,.,^-Yl-Xk = x;Xi 



< 



m-V 



,P 



The proof is completed by noting that the difference of the two expectations 
in (40) can be expressed as 



E 



fiXu')AH\Yj,.,,-Yl-,Xk = x 



□ 



Remark 7. The proof of Corollary 3 actually yields the stronger result 
that the left hand side of (40) is bounded above by 



hlLp'^-'E [\f{X,j,)\\Y,,,,-YljX, 



k = x 
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Corollary 4. Let {Xk, Yk} be a HMM which satisfies (A3) and and let 
{Xfcjlfc,!^^} he the corresponding extended HMM defined as in (29). Then 
for all k' < k < I < m, ji < ■ ■ ■ < jr and ji < • • • < js such that ji Aji > k' , 
f G Loo, x,x' € X and 1 < < rg < r, 1 < sj, < < s such that 
jrt A jst > k, jr, A js, > m and I > jr^ V js^ we have that 



E 



f{X,,m)\Yj,.,.;Y{ 



Jl:s- 



E 



f{Xl:. 



(41) 



< 2 



■,Xk = X 

m)A(Z-irjVisft) 



where p is as in Lemma 13. 



Proof. It is clear that Yj C Yj, and Y-^ C Y-^ . By conditionine 
on X- and X- , the difference of the two expectations in the left 
hand-side of (41) can be expressed as 



E 



f{Xl:m)\Yj^ '.,Y- ;x ^ ;x. -. 



-E 



f {Xi.,m)\Yj,.^.^^ ; Yj^^^^^ ; x^v^vj,, ! Aj,, 



xp[x'. ,x'. -. \Yi. lY-"^ ;Xfc/ = x' 



|y._;17 ■,X, = x 



X /i((ix'. ,'. )u(dx'. .-. )u(dx- )u(dx- ). 

The result now follows by bounding the difference of the two conditional 
expectations in the integrand using (35). □ 



Remark 8. Using exactly the same proofs as above one can show that 
the conclusions of Corollaries 3 and 4 o-nd Remark 7 are still valid if the 
functions f{Xi.ii), h{Xm:m') md f{Xi-rn) in the statements of those results 
are replaced with the functions f {Xij, ,Yi,i,) , h{Xm:m',Ym.,rn'), f {Xtm^Yim) . 

The next result establishes certain properties of the gradient of the filter 
conditioned on the infinite past, see Le Gland and Mevel (2000) or Tadic 
and Doucet (2005) for further information concerning the gradient of the 
filter. 



Lemma 14. Let {Xk,Yk} be a parameterised collection of HMMs which 
satisfy (A3)-(A5) and let {X^., 1^., y^*^} he the corresponding extended HMMs 
defined as in (29). Then for all 9 ^ G where G is as in assumptions (A4) 
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and (A5) and every sequence of observations . . . , Y-i; Y^, . . . there exists an 
valued function ^ 0Pe;Y-^.-i;Y^.^ (xo) in Li{fi) such that such that for all 
k,n > 0, X e X 



sup 

/^ll/lloo<l 



f{xo)VePe;Y_^.._uY,-.^ (xo) fi{dxo) 



(42) - / f{xo)VePe {xo\Y^n:~i;Y{.i,; X-n = x) fi{dxo] 



-A- 
2 " 2 



< Cp 

where p is as in Lemma 13, C < oo is a global constant independent of 6 and 
. . . , y_i; Y{, . . . and VePe {xQ\Y^n:-i', ^i k'^ = x) denotes the gradient of 
the density of the conditional law (xo|^-n:-i; Y^.^] X^n = x) w.r.t. p,. 
Furthermore there exists K < oo such that for all k,n > 0, x and 9 £ G 

(43) Vepe (xo|yi„:-i; Fi'^; X_„ = x) , VePe-V-^-.-vXi^ (^o) - ^ 
almost surely. Finally we have that for any f G L^o 

Ve J f{xo)pe {xg\Y- oo; — 1 

(44) = j fixo)VePe;Y-a.:-i;Y{,^,-ixo) Kdxo), 

where (44) defines a continuous function of 6 on G. 

Proof. We begin by proving (42) and (43). First note that since it is 
sufficient to prove the results component wise with respect to the vectors 
'^ePe{-\ • • • ) and VePe{-\ ■ ■ ■) can assume that d = 1. For any suitable x, 
/, n and k 



f{xo)V0Pe {dxo\Y^n:~i; Y^.j,; = x) 
-1 

= ^ E [/(Xo)Ve log {qe{X,,X,+i)gg{Yj\X,)) |y_„:_i; y/^^; X_„ = x] 

j=-n 

(45) 

-1 

- Ee [f{Xo)\Y^n:~i;Y,%;X^n = x] x 

j=-n 

(46) 

E, [V, log {qe{Xj,Xj+MY,\Xj)) \Y_n:-i; Y{.y,X_n = x] 

(47) 
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+ Y.Ee [f{Xo)Ve log {qe{Xi_i, Xi)ge(Yi\Xi)) \Y.n:-i;Y{:k;X.n = x] 
1=1 

(48) 

k 

-Y^Eg [f{Xo)\Y^n:-i; Y^.j,; = x] X 

1=1 

(49) 

E, [Velog{qg{Xi_^,Xi)ge{Yi\Xi)) \Y^n:^i;Y(.j,; X^n = x] . 

By (A3), (A5), (41) and Remark 8 we have that for all / : ||/||^ < 1, 
x,x' £ X, 6 £ G, k,k',n,n' > and j such that —n' < —n < j < k < k' 
that 

|E [f{Xo)Velog{qe{X„X,+i)ge{Yj\Xj)) |y_„._i; y^^^^; X_„ = x] 

-E [f{Xo)Ve log {qe{Xj,Xj+MY,\Xj)) |y_„._i; ^i^^^,; = x'] \ 

(50) < ?^Cp(^'+'^)^('=-^'-i) 
and 

|E[/(Xo)|y-„:_i;yi^^,,;X_„ = x]x 

E [Ve log {qe{Xj,Xj+i)ge{Y,\Xj)) \Y^n:^i;Y{,^;X.n = x] 

- E [fiXo)\Y.n':-i; yl.k'\X-n' = x] X 

E [Ve log {qe{X,,X,+i)ge{Y,\X,)) \Y_^,.,_r,Y{.,,,; X_^, = x']\ . 

(51) < 4C^ ( 1 + C^-^) 



where p is as in Lemma 13, C is as in Corollary 4 and Ci,ci,C2 are as in 
assumption (A3) and (A5). Further by (A3), (A5) and (40) it follows that 
for all x e X, 6 € G, k,n > and j / that 

|E[/(Xo)Velog {qe{Xj,X,+MYj\Xj)) \Y.n:-i;Y{,k,X_n = x] 
-E[f{Xo)\Y^n:-i;Yf.k;X^n = x]x 

E[Velog{qe{X„Xj+MY,\Xj)) \Y^n:^i;Y{.^^;X_n = x]\ 

(52) < 
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It thus follows from (45)-(52) that for all 6* G G that for aU k,n>l 

fixo)Vgpe (xo|ll„:_i; y/.^; X_„ = x) n{dxo) 



sup sup 

^.^'6'^/:||/lloo<l 



(53) <64C72^ 



-2— oo 
2C1C2 



r=-A- 
' 2 " 2 



Further the first part of (43) follows from (45)- (49), (A3) and (A5), the 
uniform boundedness of the densities of conditional probability densities 
pe {xo\Y-n:-i]Yi-k'i ^-n = x) (Corollary 2) and Remark 7. Let K be the 
constant bounding the first part of (43) and for any x £ X , k,n > and 
observations Y-n, ■ ■ ■ ,Y-i;Y^, . . . ,Yj^ let 

Vepf {xo\Y.n:-i;Y{.j,;X^n = x) = V0P0 {xo\Y.n;~i;Ylj,;X.n = x) + K. 
(54) 

The functions V0p^{-\ • • • ) are densities with respect to /u of a collection 
of (random) finite positive measures, each with total mass equal to K 
and for which (53) clearly still holds. Since the space of positive finite 
measures equipped with the total variation norm is a Banach space (see 
e.g. Parthasarathy and Steerneman (1985)) it follows from (53) that given 
a doubly infinite sequence of observations . . . , 1^, . . . there exists some 
positive finite measure /iy_ _ .ye such that for any n > 1 



sup sup 

^•6^/:|l/lloo<l 



-00 : — 1 ) J 



f{xo)VeP0 {xo\Y-n:-i; Yf.^; X-n = x) fi{dxo) 



(55) -J f{xo)ii^_^,_^.Y.^{dxo] 



< 64C2 



cfpii-pY 



It follows by definition that Vep^(-|---) < 2K and thus from (55) that 
-i-Y/ ^ ^^ ^^^^ density Vep^y_ ye (•) is bounded above 
by 2K. Equation (42) and the second part of equation (43) now follow by 
letting V0pg.Y_^._^.v^ {■) = Vepf-Y ^. .-y^ (O " ^- We shah prove (44) 
by, for any / S L^o , applying Lemma 5 to the sequence of functions 

(56) Ee [f{Xo)\Y^n:^i;Y{,^; = x] 

for n > 1 and x G X arbitrary. Clearly the sequence of functions in (56) are 
continuously differentiable by (^42) and (^4). In order to be able to apply 
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Lemma 5 we further need to establish that the functions in (56) and their 
derivatives are uniformly bounded. This follows from (38) and (43), that 

Ee [/(Xo)|y-„:_i;yi^^„;X_„ = x]^Ee [/(Xo)|y_oo:-i; J 

uniformly which follows from (36) and finally that the sequence of derivatives 
of the functions in (56) is uniformly Cauchy which follows from (42). □ 

Corollary 5. Assume the same conditions as in Lemma 14- Then 
results analogous to those in (42), (43) and (44) hold for the gradients of 
the conditional densities pe (xo|l^-n;-i; -^-n = x), pg (xo|5^i„._i; ^-n = x) 
and Pe (xo|^-oo:-i; 5^m:oo)- Furthermore for every sequence of observations 
Y-oo:~i, ^i-oo '^'^'^ integer m>l 

(57) \VePe {xo\Y^o^.,^i- y^^) - VePe (xo|lloo:-i)| < Cpf 

where p is as in Lemma 13 and C < oo is a global constant independent of 
0, xq, y-oo:-i, ^hoo andm. 

Proof. The first part of the corollary can be proved in exactly the same 
way as Lemma 14. 

To prove the second part of the corollary it is sufficient to show that 

\VePe {xQ\Y-n:-i;Y^:n,X-n = x) -VePO {xQ\Y-n:-i;X-n = x)| < Cp2^2 

for some C and all ^, Y-oo-.-i, Y^.^, x and n. Inequality (58) can be es- 
tablished by decomposing the two gradients that appear on its left hand 
side in an analogous manner to (45)-(49). The bound on the right hand side 
then follows by bounding the terms in this decomposition individually using 
(50)- (52) and the fact that 

(59) ¥e (xo|r-„:-i; y^n; X-n = x)-¥e (xo|y-„:-i; X_„ = x) < 2/)™ 
for all e, YL 

oo:—i^ ^loo' and n which follows immediately from (35). D 

Appendix B: Proofs of Lemmas 1, 2, 3 and 4. 

Proof of Lemma 1. We begin by observing that a straightforward con- 
sequence of assumption (A3) is that for all (6',e) S x [0,oo), r > and 
sequences y-r-, ■ ■ ■ ,yi 

(60) ci < pg(yi|y_r. . . . ,yo) < ci. 
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Further by Lemma 13 it follows that the finite history conditional likeli- 
hoods pg{yi\y-r, ■ ■ ■ ,yo) converge to the infinite history conditional likeli- 
hoods pg{yi \ . . . , yo) as r ^ oo uniformly in 9, e, the sequence of observations 
. . . ,yo,yi and initial distribution 7r(x_r-i). Thus by (60) it follows that in 
order to show continuity w.r.t. the first term and right continuity w.r.t. the 
second term of the mapping (0,e) G x [0,oo) — l'^{0) it is sufficient to 
show that these properties hold for the mapping 

(61) {9,e) G e X [0,oo) ^ Ee* [logp^(yi|y_,;o)] 

for all r > 0. For the rest of the proof we shall assume an arbitrary fixed 
r > and initial distribution 7r(a;_r-i) are given. Observe that by (A2), (A3) 
Lemma 7 and the dominated convergence theorem the mapping (6, e) G x 
(0, oo) — gg{y\x) is continuous w.r.t. its first term and right continuous w.r.t. 
its second term for all y £ y and x £ X. Thus by a second application of 
(A2), (A3) and the dominated convergence theorem one immediately obtains 
these properties of the mapping {9, e) G G x (0, oo) — )• pg(yi|y_r . . . , yo) for 
any r > and sequence y-r, ■ ■ ■ ,yi. A final application of (A2), (A3) and 
the dominated convergence theorem along with the inequality (60) yield that 
the mapping B x [0,oo) — )• M given in (61) is also respectively continuous 
and right continuous. In order to prove continuity w.r.t. the first term and 
right continuity w.r.t. the second term of (61) on x [0, oo) we shall show 
for any sequences e„ \ and ^„ G O — )■ ^ G 0, that 

(62) Ee* [logp^^(yi|y_,:o)] ^ ^e* [logpe(^i|l'-r:o)] 
as n — )• oo. First note that 

Ee* [logp^"jyi|y_,:o)] = te* [logplliY^r, ..■,Yi)- logplliY.r, . . . ,yo)] . 
Thus in order to prove (62) it is sufficient to show that 

(63) Ee* [logplliY.r, ■ ■ ■ ,Yi)] ^ Ee* [logpe{Y.r, Y,)] 
and 

(64) Ee* [logplliY.r, ■ ■ ■ ,Yo)] ^ Ee* [logpe{Y.r, • • • , Yq)] 

as n — )■ oo. We will now conclude the proof of the theorem by proving 
(63) and observing that the proof of (64) follows in a completely identical 
manner. The differences in the values of the likelihoods in (63) evaluated at 
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different parameter values On and 6 can be bounded by 

|p^"jy_„...,yi)-p^"(y_„...,yi)| 



< 



/ n ^eAxk-i,Xk)gll{Yk\xk) 



+ 



Xr+3 
1 



^i{dx-r-l)^J^■{dx-r■.l) 



W qe{xk-i,Xk)gll{Yk\xk) 



k=—r 



'ir{dX-r-l)lJ.{dX-r:l) 



Jl qe{xk^i,Xk)gll{Yk\xk) 



k=—r 



7r{dX-r-l) fJ-idx-r-.i) 



1 



JJ qg{xk-i,Xk)gl"{Yk\xk) 



k=—r 



<C 



r+2 



(65) 



1 



1 1 

Xk-i,Xk)- Y[ Qe{xk-i,Xk 



k=—r 



TTg{dx^r~l) fJ-idx^r-.l) 

'ir{dx^r-l)fJ'{dx^r:l] 



k=~r 



+ C 



i""' E / H:{Yi\xi) - gl-{Yi\xi)\7r{dx^r-i)Kdx-r:i) 



(66) 



where ci is as in (A3) and (66) follows from (A3), the definition of 5g(-|-) 
and the telescopic identity 



W ikh - W akbk 



k=l 



k=l 



n / n 

E 

1=1 \k=i 



Jl Ofc X {bi - bi) Ylbk Yl 



k=l k=l+l 



which holds for any collection of reals ai, . . . , a„; 6i, . . . , 6„; 6i, . . . , 6„. Clearly 
assumptions (A2) and (A3) and the dominated convergence theorem imply 
that the quantity in (65) converges to zero as 9n converges to 9. Furthermore 
the definition of fl'g(-|-) and convergence of 0„ to 9 imply that for any 6 > 
the limit supremum of (66) as n — )• c« is bounded above by 

Js'^r. Agl{y\xi)i^{dy) 

limSUpC^^"*^ I / ' 77^^ 'JT{dX-r-l)^M{dX-r:l) 

J=~r 



Xr+3 



(67) 
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where for all x G A" and y G y 

^9e{y\x)= sup \ge'{y\x) - ge{y\x)\ . 
e':\e'-e\<5 

It then follows from (A3), (67), the dominated convergence theorem and 
the Lebesgue differentiation theorem (see Wheeden and Zygmund (1977), 
Chapter 10) that 



limsupcj"''^ y~] I 

n^oo \l=-r J^""^^ 



\9gliyi\xi) - gl"{Yl\xi)\Tr{dx.r^l)f,{dX-r:l) 



(68) 



Agg{Yi\xi)TT{dx^r~i)Kdx- 



r:l J 



for any S > 0. Next we observe that by (A2) we have that lim^^o ^9g{y\x) = 
for all y G 3^ and x G X and hence that by applying (A3) and the domi- 
nated convergence theorem to (68) we have that 



limsupc^^^ \J / 



'iYl\xi) - gl"(Yl\xi)\TT{dx^r-l)Kdx^r:l) = 0. 



(69) 



Thus it follows from (60), (65), (66) and (69) that for almost all sequences 
of observations Y^r, • • • , that 

(70) hm log p^" (y_, , . . . , ) = lim log p^" (y„„ . . . , Fi ) . 

Since 

pl{Y^r,..-,Yi) 

1 

n 



Ib' 9e{y\xk)v{dy) 



\Xk-l,Xk] 



k=—r 



u(B 



TT{dx ^r-l) fJ-idx -r:l) 



we have that (63) now follows from (60), (70), the Lebesgue differentiation 
theorem, (A3) and the dominated convergence theorem. □ 

Proof of Lemma 2. We start by showing that l{9) is continuously dif- 
ferentiable. First observe that by (A3), (A4), (A5), (36), (38), (42), (43), (44) 
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and the dominated convergence theorem we have that for arbitrary x £ X 

hm logp0{Yi\Y-n;O; X^n = x) = logp0(Yl|y_oo;o) 

(71) 

hm \7glogpe{Yi\Y-n;o;X-n = x) = V elogpg{Yi\Y-oo:o) 

ra— >oo 

uniformly in ^ G G and . . . , Yq, Yi and that the quantities in (71) are uni- 
formly bounded. It follows from (71) that 

m = lim Ee* [logpgiYi\Y^n:o;X^n = x)] 

n—^oo 

and hence from (71) and Lemma 5 that 'Vgl{9) exists, is continuous and is 
equal to lim„_j.oo Eg. [Vg logp6i(^i|^-n:o; -^-n = x)]. Since gg(y|x) defined in 
(6) satisfies all the conditions laid out in (A3)-(A5), the same conclusion 
applies to l'^{9). To prove the corresponding results for VqI{6) observe that 
by the Fisher information identity 

V^Ee, [logpe{Yi\Y-n;o;X-n = x)] 

= Ee* [Velogpg{Yi\Y-n;o;X-n = x)VelogpeiYi\Y.n;0]X^n = xf] . 

Existence and continuity of Vgl{9) then follows from (71) and Lemma 5 
applied to the functions logp0{Yi\Y-n-Q', X^n = x). Furthermore the fact 
that Vjl{9*) = I{6*) now follows from Douc, Mouhnes and Ryden (2004). 
We begin the proof of (12) by observing that from (71) and the identity 
logpg{Yi, . . .,Yn\Xi = x) = Ylk=i ^oSPe(Yk\Yi:k-i; Xi = x) we have that 

= lim -Ee^ [Velogpe{Yi, . . . ,y„|Xi = x)] 
and similarly, for any e > that 

Vel'{e) = lim -Ee^ [Velogp^Yi, . . . ,y„|Xi = x)] . 
71— >oo n 

Thus it is sufficient to show that there exists some positive constant R such 
that for any sequence Yi, . . . , y„, initial distribution ttq and 9 £ G, 

(72) \VelogpeiYi,...,Yn\Xi=x)-Velogpl{Y^,...,Yn\Xi = x)\ <nRe. 

For all 9 £ G, sequences Yi, . . . ,Yn and Y^ , . . . ,Y^ drawn from the original 
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and perturbed processes respectively and x G X 
Ve logpe (Fi, . . . , Yn\Xi = x) - V, logp^ (Y{, = x) 

n 

= (Ve logpe {Yl■.^■, Yf^.jXi =x)-Ve logpe {Yi.,i_i; y^jXi = x)) 



i=l 
n 



= (Ve logpe (yi|yi;i-i;r4i;„;Xi = x) 
1=1 

-Ve logpe (y/|li:^-i;y4i:„;Xi = x)) 

= X] ( / ^^39 iYi\xi)pe {xi\Yi;i-i;Yi^^.^;Xi = x) ^{dxi) 
i=i ^-^ 

+ j 9e {Yi\xi)VePe {x,^Yi.,i_i;Ylj^^.^;Xi = x) ix{dxi) 
X (yj 9e {Yi\xi)p0 (xi|Yi:j_i;y4i.„;Xi = x) fi{dxi)^ 
Ve5e {yf\xi)pe {xi\Yi.,i-i;Yl^^.^; Xi = x) fi{dxi) 

1=1 ^-^ 

+ j 9e (Yi'lxi) Vgpe (xi|Yi:i_i; y4i.„; Xi = x) fi{dxi) 
(73) 

X (^J g0{Yi\xi)pe{xi\Yi.,i_i;Yij^i.,^;Xi = x) fi{dxi)^ 

In particular (73) holds true when Y^,...,Y^ = Yi, . . . ,Yn and so (72) 
follows from (A3), (A5), (A6), (A7) and (43). □ 

Proof of Lemma 3. Throughout this proof we shall assume that the 
density of any finite collection of random variables from . . . ,Yq,Yi, . . . and 
. . . ,Yq ,Y^ , . . . is computed assuming that the initial condition of the hidden 
state process has the stationarity distribution Fq*. We begin by observing 
that from Douc, Moulines and Ryden (2004) we have that 
(74) 



I{e*) = hm -Eg. Vg \ogpg, (Yi, . . . , y„) .Ve logpe* iYi,...,Y„ 

n— ^-oo n 

r{e*) = lim -Ee* 



Ve logfe* (n% . . . , y„^) .Ve logp^, {Y^, Y^^f . 
By the Fisher identity we have that for any 1 < k < k' and subsequences 
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{ji,---Ji} C {1,...,A;} and C {k + I, . . . ,k'} that 

(75) Velogpe = ^ [v.logp, {Y^.,,;Y^^,.^,,) lY^.^Y^^^, 

Further by construction of the perturbed process one can easily show that 
given any n and any subset {ji, . . . , ji} C {1, . . . ,n} that 

(76) Ve logpe (Fi, . . . , y„) = logp, (n, . . . , y„; F^^, . . . , Y^^) . 
Using (75) and (76) it follows that for any 1 < i < n 



Ve logpe* (yi:i,y4i;„) .Ve logpe* {Yi:i-uY, 
= Ee* [Velogpe* (yi;i_i,y4) .Velogpe* (Yi-.i-uY^, 
and hence that 

Eg, [(Veiogpe* (yi;i,y4ij - Veiogpe* (yi;.-i,y4)) 

• (Vgiogpg, (yi;„y4i^„) - Vgiogpe* (yi,_i,y4))' 



Vgiogpg. (yi:i,yi+i:n) -Vgiogpe. (yi:i,y4i:„; 



T 



(77) 



Velogp^, (^l:^-l,>^4) -Velogp^. (Yl:i^l,nr 

Using (77) and (74) and stationarity we have that 

i{e*)-r{e*) 

1 " - r/ 

= hm -VEe* Ve logpe* (yo|yi_i:_i;yi^^„_,) 



i=l 



lim Eg. 

n— >-oo 



-Ve logpe* (yoiyi-i:-i;y^„_, 
• (^Ve logpe* (yo|yi_i:-i;yi^:„_i) 

-ve logpe* (yoiyi-.-i;y^„_,)) 

Ve logpe* (yo|y-n:-i; yi^:J - Ve logpe* {Yo'\Y.n:-i;YU 
Ve logpe* {Yo\Y_n:-i;Y{J - Ve logpe* (yo^|y_„:_i; y^ J 



(78) 
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where the last equahty follows from assumptions (A3) and (A5) and equa- 
tions (34), (43) and (53). In addition, using (42) and the dominated conver- 
gence theorem, we conclude from (78) that 

1(9*) - r{9*) = E0, [{Gg.,Y^^.,_,,Y,,jYo) - Gl,.^Y.^,_^;YlJyo)) ' 

(79) 

(Ge*.Y_^._,.Yl^{Yo) - G^.;y_^^_i;y^.^(yo')) 

where for any sequence . . . ,Y-i;Y^ , . . . 
Ge*;Y_^._y,Yf,(Yo) = lim Vglogpg* {Yo\Y_n:-i;Y{.J 

= (^99e* {Yo\xo)pg* (xo|y-oo:-i;>T:oo) 

+9e* {Yo\xo)Vgp0* {xo\Y^oo:-i;Y{;.^)^ n{dxo) 
(80) 

X (^j ae* {Yo\xQ)pe- (xo|y-oo:-i; >T:oo) /"('^^Jq)^ 

and 

G'^*;y_._,;y,^ (^o)= hm V,logpe*(l^|l^-n:-i;y^J 
= (/ (v,ff^, {Y^\xo)pe* (xo|y-oo:-i;n-oo) 

+9e* {YQ\xQ)VePe* (xo|y-oo:-i;l?;oo) ^^(f^^o) 

(81) 

X {^j al' {Yq\xq) P0, {xq\Y^oz..,^i]YI^) ll{dXQ)^ . 

Further by using assumptions (A3), (A5) and (34), (42), (43) and (44) we 
have that the conditional likelihood functions logpg* (^o|^-n:-i) ^I'-n) 
\ogpg* {YQ\Y^n:-i]Y^.^) as well as their derivatives are bounded uniformly 
in 9 and . . . , y_i; Y^*^, . . . and that the derivatives are uniformly Cauchy in 
n whilst the conditional likelihoods themselves converge uniformly to the 
quantities logpe* (lo|^-oo:-i; Y^.^) and logpe* (^ol^-ooi-i; ^loo)- Hence we 
can apply Lemma 5 to obtain 

Ge*;Y_^.,_,;YlJY,) = Velogpe-(l0|l^-oo:-i;y^oo), 

G'^.;y__,;y,^^^(>^o) = Velogpe.(1^111oo:-i;y^oo)- 
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It now follows from (79) and (82) that 

i{9*)-r{e*) 



Ea* 



En 



Ve logpe^ (yo|^-oo;-i; n-oo) - Ve logpe-(l^o1^-oo:-i; l^^oo) 



Ve logpe.(yo|^-oo;-i; Y{.,^) - Vg logpe*(l^o1^-oo:-i; n-o 
Velogpe*(>o|^-oo:-i;l^^oo) - Velogpe*(^o1^-oo:-i;n- 



Ve logpe*{Yo\Y^o.:-i;Y,':oo) " logpe*{Y^\Y^^:-i;Y{,^) ) |y-oo:-i; l^i:oo 

(83) 



Finally by applying the Fisher inequality (75) and (76) to the conditional 
laws P0.(yo|>^-oo:-i;n-oo) and Pg* (Foll^-oo:-!; ^-oo) we get that 



VelogPe*{Yo\Y^oo:-i;Y,'..oo) " logp^. (Fol^-oo:-!; 



Velogpe*(i"o|>^-oo:-i;n-oo) - Velogpe-(>^o1^-oo:-i;l^^oo) l^^-oo:-!;^' 



1«. 



Velogpe*(lo|^-oo:-i;lT:oo)- 

Ve logpe* {Yo\Y^oo:-i; Yl^f\Y^^.,^i;Y{,^ 

Velogpe*(l^o1^-oo:-i;n-oo)- 

V, logpe-(1^0 l^-oo:-i; yi^:oof l^-oo:-i; >Tx 

*). 



(84) 

The result now follows from (83) and (84). 



□ 



Remark 9. Assume (A1)-(A5). Then in exactly the same way as one 
proves Lemma 3 one may prove that 



ue*) = p(e*) + Ee* i'"'-"'-' (e*) 

-i — cxd: — 1 )-i m:oo 
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where for any pair of sequences y_oo:-i and Y^.^ and any integer m > 1 



m 



m 



Vg log po* (lo :m—l 

Vg logP0*(yo':m-l|5^-oo:-i; 



(85) 



Proof of Lemma 4. We begin by establishing part 1. From (24) and 
(84) we have for any l^-oo:-! and Y^ that ly " .ye {0*) > from which 
the first assertion of part 1 immediately follows. In order to prove the second 
assertion of part 1 we note that it is sufficient to prove that under the 



assumption of connectivity we must have that Eg* 
implies I {9*) = 0. Since we have by Remark 9 that E, 



1 Y^ \ 







Eg, 
Kg, 



o 



Y 1 Y^ 

— oo: — 1 1-* m;oo 
-^0:m-i:^cf.„_l 



for all m > 1 this will follow once we show that 



for all m > 1 implies 1(6*) = 0. 
aserve that by Lemmas 8 and 9 and the assumption that z/ is connected 
V *Ub^) is connected for all e > and m > 1 and thus 
from (A3) and (6) that the conditional laws Pe* (yo:m-i|^-oo:-i; ^rn.:oo) ^^"^ 
^s*{Yq.,i^_i\ Hoo:-i; ^m:oo) ^'^^o Connected for all e > and sequences 
Y-oo:-i and Y^.^. It then follows from (24) and (85) that for all m > 1 that 



Eg* 



Yo: 



1 -V"! 



(r) 



implies that 



Eg, 



Vglogpg*{Yo 

;m— 1 

Vg logp0.(yo I'm— 1 



Eg 



Vg log pg* (yo™-ll^-oo:-i; Y^ 



m:oo) 



Vg logpe.(yo'.„_i|y_oo:-i; 
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for all y_oo:-i,^:oo ^-S- and hence by Lemma 10 that 

Vglogpg*{Yo;m-l\Y-oo:-i;Y^.^) = 

Pe*{yo:m-i\y-oo:-i',ym:oo) ^-S. and thus by Fisher's identity applied to the 
conditional probabilities that 



(86) 



Vg logpg*(Yo\Y-oo:-i; 







oo; — 1 



^m:oo) a.s. also. Finally observe that one can derive expres- 
sions for the gradients of the conditional densities pg*(Yo\Y-oo:-i; Y^.^) and 
pg* {Yo\Y-oc:-i) analogous to (80) and (82). It then follows from these and 
(A3), (A5), (35), (36), (37) and (57) that 

(87) Vglogpg*{Yo\Y^oo:-l)= lim log pg. (Fq |>^-oo:-l ; i;^;oo) 



a.s.. It now follows from (86) and (87) that if Eg* /'"^''^J^"' ((9*) = 

for all m > 1 then logpg* (lo|^-oo:-i) = a.s. and hence that I{6 ) = 0. 

Next we establish part 2. Recall that given a positive semi-definite matrix 
M £ M'^^'^ and a sequence of M'^^'^ valued positive semi-definite matrices 
{M„}„>^ that M„ M if and only if v'^M^v v'^Mv for all v G R'^. 
Thus in order to prove part 2 it is sufficient to show that for every sequence 



en oo and every v G M™ that v'^Kg* 
definition and stationarity 



v'^I{6*)v. By 



v'^Eg 



E 



ly'-'^^.y^ {en 

-r-oo:-l,-ri.oo ^ 



{v'^. {Vg logpg, {Yo\Y.oo:-i; l^^oo) " logpg* (yo11^-oo:-i; n-oo)))^ 



and 
(89) 



v'^Iv = E 



\Vglogpg* (yo|^-oo:-l))^ 



Further by assumptions (A3) and (A5) and (43) we have that there exists 
some K < oo such that 

V^'.Vglogpg, (yo|lloo:-l) , ^^^.V.logpr (lo |>^-oo;-l ; ^-oo ) , 

v^.Vglogpg.{Yo'\Y^^.,^i;Y{,^)<K 

a.s. for all e > and sequences . . . , Hi; Y^, . . .. The proof of the second part 
of the result will follow from (88), (89) and (90) once we show that for any 
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. . . ,Y.i;Y,% . . . 

.g^. V.logpe* (yo1^-oo:-i;y^oo) ^ 0, 

as e — )■ oo in pe* (lo|^-oo:-i; ^i:oo) probability. The first part of (91) is a 
straightforward consequence of applying Lemma 11 to the conditional laws 
Fg* (Yol^-oo:-!; ^i^:oo)- establish the second part of (91) observe that from 
(34), (42), (44), (82) and (80) we have that there exists some C < oo such 
that for all . . .,Y^i;Y^, . . . 

IVelogpe- (y|r-oo:-i;n-oo) - Velogpe. iy\Y^n:-i;Y,^:n)\ < C p^'^ 
|V,logpe* {y\Y-o.:-l) - Velogpe* {y\Y-n:-l)\ < C fi^'^ 

for all n > 1, e > and y G 3^. It then follows from (A3), (A5) and (43), 
the representation of the score functions Vq \ogpQ* (•!•••) given in (73), the 
representation of integrals w.r.t. the filter gradients given in (45)-(49) and 
Lemma 12 that 

(93) Ve logpg. {YQ\Y^n:-i; Yl^,) ^ Vq logpe- {Yo\Y^n:~i) 

in probability as e oo. One can then conclude that the second part of (91) 
holds via (92) and (93). In order to complete the proof of the lemma recall the 
two random variables Gfl*.Y_^._, .ye (Yn) aiid G%t .v -v^ (Ylf) defined in 

: oc. 1 : 1 :oo ^ ' ^ — oo: — 1 ' 1 ■ oc 

(80) and (81). It follows from (A3), (A5) and the Lebesgue differentiation 
theorem that Ge* y_^._-^-yE (Yb) — ^ G'L.y .yt (Yq) a.s. as e \ 0. Since 

' ■ ' l:oc J oo: li 1:cjc 

it follows from the proof of (82) that there exists some K < oo such that 
for all e > the functions Ge*-Y_^._yX^, (Yq) and Glt.Y_ ._ .y^ (Yq) are 
bounded above by K for all . . . , Yq; Y^^, . . . we have that part 3 follows 
from (84) and a simple application of the dominated convergence theorem. 
Finally, part 4 is a trivial consequence of (79), (80) and (81) and assumptions 
(A3), (A6) and (A7). □ 
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